Cross-lingual hate speech detection using domain-specific word embeddings

THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. Hate speech detection in online social networks is a multidimensional problem, dependent on language and cultural factors. Most supervised learning resources for this task, such as labeled datasets and Natural Language Processing (NLP) tools, have been specifically tailored for English. However, a large portion of web users around the world speak different languages, creating an important need for efficient multilingual hate speech detection approaches. In particular, such approaches should be able to leverage the limited cross-lingual resources currently existing in their learning process. The cross-lingual transfer in this task has been difficult to achieve successfully. Therefore, we propose a simple yet effective method to approach this problem. To our knowledge, ours is the first attempt to create a multilingual embedding model specific to this problem. We validate the effectiveness of our approach by performing an extensive comparative evaluation against several well-known general-purpose language models that, unlike ours, have been trained on massive amounts of data. We focus on a zero-shot cross-lingual evaluation scenario in which we classify hate speech in one language without having access to any labeled data. Despite its simplicity, our embeddings outperform more complex models for most experimental settings we tested. In addition, we provide further evidence of the effectiveness of our approach through an ad hoc qualitative exploratory analysis, which captures how hate speech is displayed in different languages. This analysis allows us to find new cross-lingual relations between words in the hate-speech domain. Overall, our findings indicate common patterns in how hate speech is expressed across languages and that our proposed model can capture such relationships significantly.


Introduction
This article uses words or language that is considered profane, vulgar or offensive by some readers.Due to the topic studied in this article, quoting offensive language is academically justified but we nor PLOS in no way endorse the use of these terms.Likewise, the terms do not represent the opinions of us or that of PLOS, and we condemn online harassment and offensive language.Timely information dissemination and other types of human communications take place on the Web, especially on online social media platforms.Along with many useful information exchanges, there are also manifestations of communication disorders such as fake news and hate speech.which can produce harmful side effects.In particular, hate speech can be understood as language that expresses prejudice against particular groups of people.It is a phenomenon related to human behavior that spans cultures and languages and can seriously limit the participation of certain groups in social media activity.
Existing solutions for multilingual hate speech detection are considerably narrow, mostly because hate speech research has been primarily in English [1][2][3][4].As a consequence, there is a considerable scarcity of labeled data, lexical resources, and models beyond the English scope.There are some recent efforts to systematically address multilingual aspects of hate speech detection, most of which rely on emerging multilingual tools such as general-purpose text representations [5][6][7].However, as an emergent topic, there is still no consensus on how to undertake this issue effectively in low-resource languages.In this regard, our work focuses on investigating solutions that can leverage data from high-resource languages to improve performance for other languages with little or no resources.In particular, we address zero-shot multilingual learning.where there are no resources directly available for learning a task in a particular target language; hence a different language needs to be used for this purpose.
We hypothesize that general-purpose multilingual word embeddings do not necessarily capture patterns that naturally arise when words are used with a hateful intent.For instance, words related to nationality, religion, and race will mostly have a neutral connotation in general written text.Nevertheless, these same words can be loaded with hateful meanings when they are used in the text that contains hate speech [8].Following this intuition, we propose a set of multilingual word embeddings that have been specifically created for hate speech.To achieve this, we created different hate speech word embedding feature spaces in different languages and aligned them in an unsupervised way using a projection technique [9].
We evaluated the effectiveness of hate speech detection of our embedding model about other general-purpose representations, using them as input features for classification in English, Spanish, and Italian.
Our findings show that the use of our hate-specific representations mostly improved crosslingual classification model performance compared to the other representations.
In addition, we introduce a qualitative exploratory analysis of word contexts for our hatespecific embeddings.
This suggests that besides the information provided by translating the general meaning of words to different languages, there are more specific cross-cutting patterns in how hate speech is displayed in those languages.
For instance, for a general-purpose multilingual embedding, the natural (context-based) translation for the Italian word "migranti" is "migrants" in English and "migrantes" in Spanish.In contrast, the translation in our hate-specific embeddings is "illegals" in English and "palestinos" in Spanish.
These patterns allow us to transfer knowledge from one language to another when detecting hate speech.Moreover, our approach requires very little data in contrast to other representations which have been trained on massive amounts of text.

Problem
Existing multilingual hate speech detection solutions predominantly rely on general-purpose embeddings, often trained in English.In contrast, low-resource languages (in the hate speech detection task) like Spanish remain under-treated.This study addresses the gap by introducing

Monolingual hate speech detection
As for other NLP tasks, English has been the most addressed language in hate speech detection.In the related literature, several methods are leveraged for in-domain English evaluation.Some of these approaches, mainly in the early years of task development, use traditional machinelearning models [3,10,11] such as Support Vector Machine (SVM).Logistic Regression (LR).and Random Forest (RF).These algorithms are commonly applied using existing software tools such as WEKA(https://www.cs.waikato.ac.nz/ml/weka/), scikit-learn (http://scikit-learn.org/ stable/index.html),and combined engineered features [12,13].
In addition, there have been critical analyses of English-based systems and datasets to provide a better understanding of the problem and the possible biases existing in datasets and models [26][27][28].

Cross-lingual hate speech detection
As demonstrated for other tasks [29,30].a multilingual approach to the hate speech detection problem could help improve the state of the art for under-represented languages.
For languages with little to no labeled resources, we need an approach in which no information about the target language is used during the training process.We refer to this constrained scenario as cross-lingual.
Translating training and testing data into a common language is one of the strategies employed for this task [31].Meta-information from network dynamics and text messages has been used as features in the literature [26].This type of feature is considered languageindependent as it is not directly related to the language in which the text is written.LASER [32] is a recently proposed model for producing multilingual embeddings for sentences.These embeddings have been combined with traditional machine learning models [6,7] for cross-lingual hate speech detection.Another approach involves fine-tuning pre-trained multilingual models like BERT [33] (https://github.com/huggingface/transformers)[5,6,34] or XLM (https://github.com/facebookresearch/XLM)[35] on the training data.

Specific embeddings for the hate speech detection
There is limited research on specialized word representations and specific pre-trained models for this task, as seen in the works cited [36][37][38].Similar to our work, some papers [37,38] describe the construction of task-specific word embeddings, using techniques such as Word2Vec or GloVe to construct low-dimensional word embeddings.These works also utilize unlabeled data extracted from social networks considering specific hateful queries.Other works describe the construction of specific task word embeddings using different techniques.For example, Badjatiya et al. [1] use an LSTM model for building word embeddings from a dataset of labeled hateful tweets.There have also been efforts to adapt existing pre-trained models, as in the case of Hate-BERT [39].where the authors retrained BERT using English-banned comments from Reddit.
Despite these efforts in the monolingual, mainly English scenario, current cross-lingual techniques often use general-purpose features or general pre-trained models.To our knowledge, no works are creating specialized multilingual representations (word embeddings) for this problem.Considering the particularities of the hate speech phenomenon, domain-specific representations will contribute to improving cross-lingual classification, and we focus our work on this.In contrast to established methodologies, our proposed approach distinguishes itself by addressing the challenge of multilingual hate speech classification through the construction of specific representations.

Projection-based multilingual word embeddings
One approach for creating multilingual embeddings is by using the so-called projection technique [40][41][42].This requires resources that are relatively easy to obtain for most tasks.The idea is to linearly project two vector spaces into a common space by optimizing the relationship between dictionary-paired vectors obtained from bilingual dictionaries.The bilingual dictionaries can be induced from the data (unsupervised methods) [29,43,44] or provided beforehand (supervised methods) [9,41,45,46].
We select this type of approach for the creation of our multilingual hate speech embeddings.

Hate-speech specific word embeddings
In this section, we describe the process of creating domain-specific word representations for hate speech in social media.Our methodology is divided into two steps: 1. the creation of domain-specific monolingual word embeddings for each separate language (detailed in Section 3.1), and 2. the alignment of the monolingual word embeddings into a single embedding space using bilingual dictionaries (detailed in Section 3.2).
We consider this to be a weakly supervised approach since it assumes the existence of i) a bilingual dictionary. of at least a few terms, to go from one language to another, and ii) a small hate speech lexicon for each target language.In particular, dictionaries allow us to forgo the need for large amounts of parallel or labeled data [47].We use an off-the-shelf dictionary (described in Section 4.2).
Another consideration is the construction of monolingual word embeddings requires a significant amount of unlabeled data.However, in comparison to the huge volume of data that is needed to train general-purpose word representations, our required data is quite small.Moreover, unlabeled data can be retrieved easily from social media using a small set of domainspecific (i.e., hateful) seed terms.We detail each step next.

Domain-specific monolingual embeddings
We describe the creation of monolingual word embeddings from social media.First, using a set of seed terms (or queries), we retrieve a set of social media text messages.In particular, we use the online social network Twitter, which is a microblog (i.e., short-text-based) platform.Our seeds are based on terms contained in public lexicons of hateful words [11,48].These seeds guarantee that the retrieved messages contain hateful terms, causing, as an overall effect, a higher probability for hateful messages to appear in the resulting corpora.We focus on English, Spanish, and Italian.Specifically, we collected 30 million tweets in English and 10 million tweets in Spanish and Italian, each.
Using this corpus, we train Word2Vec [49] 100-dimensional word embeddings for each language individually.By applying the Word2Vec technique, we obtain (mathematical) close vectors for semantically similar words.Since our corpus is biased towards hateful contentthanks to the weak supervision provided by the seed terms-we consider our resulting embeddings to be domain (hate) specific.Notice that the previous result proceeds for any other algorithm for creating the monolingual embeddings.

Alignment of monolingual spaces
In this section, we describe how different monolingual word embeddings are aligned into a single embedding space.As an alignment algorithm, we adopt a technique based on canonical correlation analysis (CCA) proposed by Faruqui and Dyer [9].In this process, a pair of monolingual word vectors are projected into a common space by learning two projection matrices V and W that maximize the correlation between the dictionary-paired projected vectors.
More specifically, assume that X 2 R d�n 1 and Y 2 R d�n 2 are two word-embedding matrices corresponding to two different languages, where every embedding is a column in each matrix.We note that n 1 and n 2 might be different as the sets of embeddings might be created from vocabularies of different sizes.

Canonical variables for embeddings.
Further, assume that we have a set of n translated terms between the two languages, and let X 0 and Y 0 be matrices in R d�n that is obtained by taking the columns from X and Y, respectively, that correspond to the aligned translated terms.
The canonical variables for set X 0 are denoted as U = [U 1 , U 2 , . .., U p ], and for set The canonical variables are linear combinations of the original embeddings: Here, a ij and b kj are the coefficients to be determined through the canonical correlation analysis.

Objective function.
The goal is to maximize the correlation between the canonical variables U and V.

Projecting embeddings.
Once the canonical correlation analysis is performed, the canonical variables U and V provide the projections of the original embeddings into a shared subspace.
Using these two projection matrices, we can project the entire set of embeddings for both languages to obtain our final set of embeddings With this method, we have aligned our initial word embeddings from step 1 (Section 3.1) into a single vector space.Following the described process, it is possible to replicate the algorithm for different languages given a set of hateful seeds.The hateful lexicon Hurtlex contained terms in 50 languages with basic computational resources.We include an implementation of the complete pipeline in our code repository.

Experiments and results
The goal of our experiments is to evaluate quantitatively and qualitatively the performance of our hate embeddings in comparison to existing general-purpose word representations.Specifically, we evaluate different settings using multilingual embeddings for English, Spanish, and Italian.

Datasets for evaluation
We used three labeled Twitter hate speech datasets for our experiments.A summary of these datasets is presented in Table 1.Each dataset is detailed next: English dataset: This dataset consists of the English dataset by Arango et al. [26].created in 2019, and the one created for SemEval in 2019 by Basile et al. [50].Both datasets contain hate speech against immigrants and women.They originated in the United States; therefore, hate targets, as well as specific terms, are framed within that particular cultural context.
Italian dataset: This dataset is composed of the dataset by Sanguinetti et al. [52].which is part of the "Hate Speech Italian Monitoring Program".The hate targets are women and immigrants.
Spanish dataset: This dataset consists of the dataset by Pereira et al. [51].which contains hate speech related to racism, sexism, and xenophobia.Additionally, we used the Spanish portion of the SemEval 2019 dataset by Basile et al. [50].which includes hate speech against immigrants and women.The tweets originated in Spain.
Datasets within the same language were merged, and their labels were binarized, following a commonly used strategy in this area for creating larger collections [6,53].

Bilingual dictionary
We use a bilingual dictionary consisting of word-aligned pairs from the Hurtlex lexicon [48].Hurtlex is a multilingual lexicon that we use to match hateful terms between different languages.It has been successfully used for cross-domain hate speech detection [54,55].The dictionaries comprise more than ten thousand words typically used in the hate speech domain and their translations to a second language.The particularity of Hurtlex is that it includes terms with different colloquial equivalents that are not usually included in generic dictionaries, as well as words that could typically appear in hateful content.An example comparing the diversity of Hurtlex with the Muse dictionary can be found in our code repository.According to Shakurova et al. [56].better results are obtained when the bilingual lexicon is from the specific domain of the task.

Quantitative evaluation
Overall, we study cross-language classification in a transfer learning scenario, when the classifier is trained on labeled data for one language and then is used to classify in another.
The Algorithm 1 describes the experimental process.We used the three datasets described in Section 4.1 in languages English (EN), Spanish (SP), and Italian (IT).For each of the possible training testing combinations of the three datasets (SETUPS), we compare different multilingual embeddings (EMBEDDINGS) including our proposal.For performing the comparison we tested several models (MODELS) and reported the best performance by setup and by word embedding (RESULTS).In Tables 2 and 3 we show monolingual and cross-lingual results.Next, we detail every section of the process.
SETUPS.We test the nine possible source and target language combinations (e.g.SP !IT).Each dataset described in Section 4.1 is split into three sets: training (80%), testing (10%), and validation (10%) for adjusting the hyper-parameters of the models.Even though our main interest is the cross-lingual scenario, we also include monolingual experiments to have reference performance of models.Intuitively, the closer the cross-lingual results are to the monolingual ones, the better they are at transferring knowledge from one language to another.
Multilingual embeddings can be used in monolingual scenarios, although their aligned characteristic are not useful in this case.
Algorithm 1 Experimental process for comparing our hateful embeddings with general purpose word embeddings.

EMBEDDINGS.
We consider five types of multilingual embeddings: MUSE [29].a set of general embeddings aligned for multilingual contexts; BERT [18] and XLM [57].general purpose pre-trained models for NLP that can be used to produce embeddings for sentences (sequences of words); LASER [32].a recent model for producing multilingual sentence embeddings and HATE_EMB our proposed embeddings described in Section 3.
In the cross-lingual setups, we use multilingual BERT (mBERT for short).In addition, we use BERT pre-trained with monolingual data Italian [24].Spanish [58] and English [18].those for performing fine-tuning in the specific task of hate-speech detection.We evaluate the usefulness of these three representations by using them to generate input features for several classification models.At the same time, they perform as baselines for comparing the hate embeddings.
MODELS.We evaluate several traditional machine learning models including Logistic Regression (LR), XGBoost (XGB), Support Vector Machines (SVM), Random Forest (RF), Decision Trees (DT), and Naive Bayes (NB) classifiers.We also incorporate deep learning models, including Convolutional Neural Networks (CNN), Feedforward Neural Networks (FNN), Long Short-Term Memory Networks (LSTM), and Multi Head Attention (MHATTN).In addition, we combine LSTM and CNN layers (LSTMCNN), as well as LSTM with Attention [59] (LSTMATTN).We tuned these models to find the best possible values for the different hyperparameter combinations (e.g.batch size, learning rate, dimensions, number of layers, etc.).The complete grid search and the best resulting parameters will be described in a section of our code repository.In the case of monolingual setups, we perform fine-tuning for the pretrained BERT models according to the setup.case, the BERT model serves as input representation and classification model (BERT Italian [24].BERT Spanish [58] and BERT English [18]).

Results
Next, we detail our results for the monolingual and cross-lingual classification task, as described in Section 4.3.
Monolingual results.Our embeddings (HateEmb) show competitiveness compared to more complex input representations.They outperform MUSE embeddings in all configurations.Additionally, the results differ from those using mBERT by less than 1%.
In Table 2. we present the results (in terms of F-score) obtained in monolingual evaluations for each setup and embedding representation.The F1 score corresponds to the best result obtained for each experimental setup and embedding representation independently of the model used.
Moreover, we include the results of fine-tuning the corresponding BERT model (Finetuned BERT).
In monolingual experiments, for the three datasets we considered, the transformer-based models XLM and BERT show the best performances.
Another important observation is that hate embeddings yield similar results compared to the BERT multilingual embeddings.Considering the state-of-the-art language models, embedding models were trained with a massive amount of data and the learning process included a higher number of parameters.We believe that our hate embeddings show encouraging results.
In general, the Italian experiments yield the worst results, as expected, since we have less data available in this language.
Cross-lingual results.The hate embeddings (HateEmb) show the best results, outperforming the general-purpose ones in three of the six experimental setups.This is noteworthy since hate embeddings are created with very little data in comparison to the other more complex models.Table 3 shows the results of the cross-lingual experiments using several different input representations.
As we have mentioned, this setting is called zero-shot multilingual transfer learning, since no data from the target language is used during training.This is, arguably, the most challenging multilingual transfer learning task.Our proposed hate embeddings outperformed LASER and Muse embeddings in all configurations.Muse was constructed with a similar approach, aligning monolingual spaces.The improvement in the performance of HateEmb can be explained by the nature of the data used in training the monolingual vectors and the specialized bilingual dictionary.
Additionally, the hateful embeddings achieved the best performance in three of the six configurations and were outperformed by the BERT model and XLM when used to generate the input representations.
In those cases, our approach ranked second best.We consider this to be a good result, given that BERT and XLM are huge models that require training millions of parameters and specialized hardware.In contrast, our embeddings are extremely lightweight and can be trained on general-purpose machines.
One of the possible reasons for our embeddings to outperform more sophisticated ones is that they are trained specifically on social media text containing hate speech.In contrast, general-purpose embeddings like MUSE are trained on diverse corpora that may not capture the nuances of hate speech well.This domain-specific training allows hate embeddings to capture subtle linguistic cues specific to hate speech, leading to better performance in hate speech detection tasks.General-purpose embeddings trained on diverse corpora may contain noise from non-hate speech contexts, which can degrade performance in hate speech detection tasks.Hate embeddings, trained on hate speech data, are less likely to suffer from this noise, resulting in improved accuracy.
In addition, the hate embeddings are aligned using a bilingual dictionary specifically tailored for hate speech, such as Hurtlex.This allows hate embeddings to capture cross-lingual semantic information relevant to hate speech, leading to improved performance in cross-lingual hate speech detection tasks compared to general-purpose embeddings.

Qualitative evaluation
The intrinsic quality of multilingual word embeddings is usually evaluated based on the Bilingual Lexicon Induction (BLI) task [60].This task measures how close the vectors representing translations in different languages are to each other.BLI relies on nearest neighbor search, identifying the most similar word in the target language given a word in a source language.The target and source words are expected to be translations of each other according to a validation dictionary.Over these results can be calculated quantitative metrics like precision and recall [29].
There are several difficulties in directly applying a BLI-like quantitative approach to assess the intrinsic quality of our embeddings.The main difficulty is that hate speech is a problem where word meanings often extend well beyond their literal translations.Thus, a low BLI score for general terms does not necessarily mean low quality for hate speech detection.We present a custom qualitative analysis based on the idea of BLI-like tests.
As a result of our qualitative evaluation, new cross-lingual relations between words emerged.

Cross-lingual relations in vectors spaces
BLI inspires our first qualitative evaluation, extracting the most related terms across different languages given a seed term.This exercise aims to identify equivalent hateful meanings rather than direct translations in both vector spaces and within the labeled datasets.
In Table 4. we present a sample of the relationships between terms comparing the hate embeddings (HateEmb) with the general-purpose multilingual embedding MUSE.
We manually selected these terms to represent groups that have been targets of hate, ensuring they were not in the bilingual dictionary used for embedding alignment.This allowed us to display a new relationship.
For each selected source term.we show the nearest neighbors (NN) terms in a language different from the source language.In most cases, the nearest neighbors in the MUSE space are terms whose standard meanings are the same in both languages.For example, in Table 4. for the Italian word "migranti", we found that the nearest terms in the MUSE space are "migrants" in English and "migrantes" in Spanish.These are literal translations for the three languages.On the other hand, the nearest neighbors in the hate-specific embedding space are "illegals" (English) and "palestinos" (Spanish).
We can observe that these are neither direct nor neutral translations of the original word "migranti".However, these words are likely to appear in similar contexts for hateful text.Within this scenario, the term "illegals" is used commonly to refer to a person who migrates to the U.S. illegally.Similarly, the word "palestinos" (that means a person of Palestine origin) is associated in hate-related contexts to a person who is an immigrant from an Arabian country.We argue that evaluating the hate embeddings considering literal translations is not suitable since in the hateful content, words like "migranti" could acquire different meanings.The correct equivalence that should have been found is unknown, due to the complexity of the hate speech problem.Moreover, expecting the same relationships across different languages (e.g."migrants"-"terrorist" = "migrantes"-"terroristas") is not correct either.The targets of hate in different languages are different depending on the socio-cultural scenario.In most of the cases, we were able to observe non-trivial translations when exploring our hate embeddings.However, in a few cases, we could observe that the equivalences are the same as in MUSE (e.g."negros" in Spanish, as "blacks" in English and "neri" in Italian).More experimentation is needed to derive a more robust conclusion, but our qualitative results provide positive evidence that our domain-specific embeddings capture non-trivial meanings and translations.

Cross-lingual relations in labeled datasets
In this section, we introduce a method for qualitatively exploring the ability of our embeddings to capture equivalences between hateful concepts in different languages over a labeled dataset.In the previous section, we use similarity measures (nearest neighbors) over the general embedding space, considering all the vocabulary used to construct those embeddings (unlabeled general data).In this section, we focus exclusively on texts from the positive class of hate speech labeled datasets in different languages.That is, we focus on multilingual data that we know that contains hateful information.We use the hate embeddings plus association rules to devise a similarity measure among terms in different languages.This experiment would serve as an intrinsic qualitative evaluation as we can assess how good the translations obtained for simple hateful terms.We next explain in more detail the method we devised to obtain the equivalences.

Association rules and word contexts.
For the first step, let x be a word and U a set of words all from one of the labeled datasets.From each dataset, we extract association rules of the form {x})U with the following semantics: if x occurs in a text T, then U � T with certain confidence [61].In that way, we can find words that usually occur together in the same text.We extract rules for the top most frequent terms x in each dataset and impose lower bounds in confidence and support.
The support measures how often a specific item-set or item appears in a dataset, or its level of popularity.A higher value for support means that the itemset or item occurs more frequently within the dataset.Confidence refers to the degree of reliability or strength in the relationship between two itemsets or items.This measure is determined by comparing the number of transactions (tweets in our scenario) that contain both the antecedent and consequent itemsets to the number of transactions that contain only the antecedent.A higher confidence value suggests a more significant correlation between the two itemsets or items.[62].
We note that one can obtain many different association rules in each dataset and for each frequent term x.Using all the rules with the form {x})U i , we compute the context of the word x as C(x) = S U i .Our similarity measure for two words is based on the similarity of contexts for those words as we next explain.

Context-based similarity measure.
We still need to introduce some additional notation to present our similarity measure.For every word u 2 C(x) we denote by supp x (u) and conf x (u) the support and confidence of term u in the association rule of the form {x})U it appears.Given words u and v, appearing in contexts say C(x) and C(y), respectively, we define the following expression that essentially compares their support and confidence metrics in their respective contexts: We call this expression the metrics similarity between u and v and we denote it by met-sim (u, v).We combine the above similarity for context words with a usual embedding similarity based on cosine similarity by averaging both to obtain a combined similarity: simðu; vÞ ¼ ðcos À simðu; vÞ þ met À simðu; vÞÞ=2 That is, we give the same importance to how similar the vectors are (cos-sim), and how similar the importance (confidence and support) of the association rules they appear in are (metsim).
We now have all the ingredients to define the context similarity of words.Let x and y be words with contexts A = C(x) and B = C(y), respectively.Then, their context similarity, denoted by cont-sim(x, y) is defined as That is, for every word in x's context (A), we compute its maximum similarity with words in y's context (B) and take the mean over all those similarities, and the other way around.The results of both directions are averaged.

Context-based similarity for hateful words.
For each dataset of every language, we first selected some frequent words (seed terms) appearing in the hateful-labeled texts.Then, for each seed, we selected the words that are more (context-) similar to all the words appearing in hateful labeled texts in a different language.Table 5 shows examples of terms and the top two most similar words using our hate-specific embeddings and Table 6 shows the results for a similar experiment but using MUSE embeddings.As a comparison, the tables also show a similar experiment for the non-hate texts.More examples can be found in our repository.
Table 5.For some terms, we found the top most similar ones in different languages using the HateEmb embeddings for hateful and non-hateful classes.We can observe different relations depending on the nature of the expressions.The numbers represent the similarity achieved in each case (%).Even though the labeled datasets are relatively small and from specific types of hate, we still find interesting cross-lingual relations.As expected, these relations are different depending on the nature of the text (hateful versus non-hateful).

Seed
For example, using hate-embeddings, for the Italian word "terroristi", which is a neutral translation of the English "terrorists", we found the words "muslims" and "fascistas" (see Table 5).In particular, "fascistas" is an adjective related to fascism and is used pejoratively.Another example that illustrates the results of using hate-embeddings is that the most similar word to "girls" in English was "perra" for Spanish.This last term is a very demeaning way to refer to women in that language.
On the other hand, for Muse embeddings (see Table 6) the closest term to "girl" is "gusta" (like) which is not semantically related.In addition, the most similar term to "gitano", which means gypsy in Spanish, was "invaders".However, using Muse embeddings the closest word is "hopefully", which is not meaningfully related.
The relationships that we have found using our hate embeddings can be interpreted as a cross-cultural similarity in how concepts are related to each other within hateful contexts.Furthermore, these relationships, although only qualitative, are very difficult to find when we repeat this experiment on general-purpose multilingual embeddings.

Limitations
CCA, the method used for aligning the word embeddings into a multilingual space, relies on two resources: unlabeled data for constructing monolingual embeddings and bilingual dictionaries for the alignment process.The first resource is relatively easy to obtain, but bilingual dictionaries may be unavailable for certain languages.
The dictionary used in this paper, Hurtlex, contains equivalences relative to the hate speech phenomenon in 50 languages.However, the technique's effectiveness may still depend on the specific characteristics of these dictionaries, such as the number of equivalences and quality of them.However, CCA's applicability to low-resource languages can be improved using some additional strategies.For example, refining the model after the creation of multilingual embeddings.The initial dictionary can be augmented by inferring additional bilingual equivalences from these vectors.This expanded dictionary enables another iteration of the method, allowing the process to be repeated multiple times to achieve improved embeddings.Another limitation of projecting multilingual embeddings, as well as using pre-trained language models, is the risk of introducing biases.The training data for creating the monolingual embeddings may contain stereotypes or discriminatory content that can reinforce cultural prejudices and impact the performance of hate speech detectors.In addition, the projection techniques, relying on statistical correlations between language embeddings, may introduce a skewed representation of languages based on the bilingual dictionaries used.

Summary & conclusions
We have presented a detailed analysis of cross-lingual hate speech classification aimed at transferring knowledge from one (or more) language to another.
Although simple, our proposed technique outperformed more complex ones.Leveraging specific-domain cross-lingual resources could be a promising direction for this task, which has been largely unexplored.
We summarize our main findings as follows: • Hate embeddings demonstrate competitive performance for monolingual classification compared to general-purpose data representations.As shown in Table 2. our embeddings outperform MUSE for all languages.Additionally, we achieve similar performance to BERT, despite requiring significantly fewer training resources than BERT.
• Hate embeddings are effective for cross-lingual classification, as shown in Table 3.They outperform other approaches in 4 out of 6 configurations: EN !ES, EN !IT, ES !EN, IT !EN.In the remaining experiments, they are the second best performing.
• Our hate embeddings enable the extraction of significant multilingual semantic relationships in hateful contexts, not limited to literal translations as with other general-purpose multilingual embeddings (Tables 4-6).This indicates that the context of words in a hateful scenario differs significantly from their context in a general scenario.Moreover, these relationships enhance currently available lexical resources.
The performance of hate embeddings compared to much more sophisticated general-purpose representations suggests that they can effectively capture domain-specific information critical for hate speech detection.
Overall, there appear to be cross-cutting patterns in hate speech that transcend languages.Furthermore, knowledge transfer from one language to another is expected to contribute to the improvement of hate speech detection models in any language, reducing the need for massive amounts of labeled data.
As future directions, we will explore other algorithms for creating domain-specific representations for hate speech.Additionally, we will study how cultural differences affect hate speech detection, even within the same language.

Table 1 . Description of the datasets used in our evaluation.
For each dataset, we show the number of tweets per class.

Table 2 . Experimental results for monolingual hate speech detection.
The cells show the F1 score of the best model for each combination of setup and input representation (embedding).The bold numbers represent the best score per setup.

Table 3 . Experimental results for cross-lingual hate speech detection.
The cells show the F1 score of the best model for each combination of setup and input representation (embedding).The bold numbers represent the best score per setup.

Table 6 . Results for an experiment similar to the one presented inTable 5 .
But considering the general-purpose MUSE multilingual embeddings instead of our hate-specific embeddings.We can observe different relations depending on the nature of the expressions.The numbers represent the similarity achieved in each case (%). https://doi.org/10.1371/journal.pone.0306521.t006